Chinese Sketch Engine and the Extraction of Grammatical Collocations

نویسندگان

  • Chu-Ren Huang
  • Adam Kilgarriff
  • Yi-Ching Wu
  • Chih-Ming Chiu
  • Simon Smith
  • Pavel Rychlý
  • Ming-Hong Bai
  • Keh-Jiann Chen
چکیده

This paper introduces a new technology for collocation extraction in Chinese. Sketch Engine (Kilgarriff et al., 2004) has proven to be a very effective tool for automatic description of lexical information, including collocation extraction, based on large-scale corpus. The original work of Sketch Engine was based on BNC. We extend Sketch Engine to Chinese based on Gigaword corpus from LDC. We discuss the available functions of the prototype Chinese Sketch Engine (CSE) as well as the robustness of language-independent adaptation of Sketch Engine. We conclude by discussing how Chinese-specific linguistic information can be incorporated to improve the

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Construction of a Chinese Collocational Knowledge Resource and Its Application for Second Language Acquisition

The appropriate use of collocations is a challenge for second language acquisition. However, high quality and easily accessible Chinese collocation resources are not available for both teachers and students. This paper presents the design and construction of a large scale resource of Chinese collocational knowledge, and a web-based application (OCCA, Online Chinese Collocation Assistant) which ...

متن کامل

Extracting Academic Subjects Semantic Relations Using Collocations

The paper presents approach to analyze semantic content of academic subjects and its internal relations using statistically-based techniques for collocation extraction from large electronic educational text corpus. It offers a survey and analysis of some related corpus-based approaches to extract conceptual relations used for educational purpose and presents a technique for semantic search of c...

متن کامل

Computing Thresholds of Linguistic Saliency

We propose and test several computational methods to automatically determine possible saliency cut-off points in Sketch Engine (Kilgarriff and Tugwell, 2001). Sketch Engine currently displays collocations in descending importance, as well as according to grammatical relations. However, Sketch Engine does not provide suggestions for a cut-off point such that any items above this cut-off point ma...

متن کامل

Augmented Comparative Corpora and Monitoring Corpus in Chinese: LIVAC and Sketch Search Engine Compared

The increasing availability of numerous corpora has significantly contributed to the understanding of words in terms of their underlying semantic structures and lexical networks (e.g. COBUILD, WordNet etc.). Through data mining and information retrieval, research in this area has vastly expanded our appreciation that what constitutes lexical knowledge goes beyond synonymy, hyponymy, metonymy, m...

متن کامل

Lexical and Grammatical Collocations in Writing Production of EFL Learners

Lewis (1993) recognized significance of word combinations including collocations by presenting lexical approach. Because of the crucial role of collocation in vocabulary acquisition, this research set out to evaluate the rate of collocations in Iranian EFL learners' writing production across L1 and L2. In addition, L1 interference with L2 collocational use in the learner' writing samples was st...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005